Skip to content

Conversation

@LucasG0
Copy link
Contributor

@LucasG0 LucasG0 commented Jul 24, 2025

IFC-1874

This PR makes the current locks more granular, ie instead of locking only on object kind, we also lock on values of the attributes contained within uniqueness constraint. In this way, we allow concurrent creation/updates of objects having different values on these key attributes. We cannot do it for relationships at it would require fetching relationships peers before locking.

EDIT 2025-09-30: this PR also contains fixes for the resource managers to prevent duplicate allocations. Locking is now done during the mutation but also during the allocation of the resource, thus making the locks "re-entrant" was required to avoid deadlocks within a same request.

Summary by CodeRabbit

  • New Features

    • Option to skip resource-pool allocation during object create/update.
    • Unified constraint-and-save operation that runs under managed multi-locks.
  • Refactor

    • Preview-before-persist flow to derive validation/save fields and locks before saving.
    • Centralized lock-name computation and multi-lock orchestration across mutations.
    • Exposed process_pools flag to control pool handling.
  • Bug Fixes

    • Branch-aware locking and improved reentrant lock behavior.
  • Tests

    • Added tests for lock-name generation and reentrant lock behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 24, 2025

Walkthrough

Adds a centralized run_constraints_and_save to validate and persist Node mutations and a lock_getter.get_lock_names_on_object_mutation to compute multi-lock names. Node creation converted to a preview-plus-persist two‑phase flow; _do_create_node now accepts a pre-initialized Node. GraphQL mutation handlers were changed to precompute and apply InfrahubMultiLock lock sets. Node pool handling gained process_pools/allocate_resources flags to opt out of pool allocation. Locking internals use ContextVar recursion tracking and a metrics flag for InfrahubLock/InfrahubMultiLock.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Lock names include attributes values" directly and specifically describes the primary change in the changeset. The entire PR is centered around making lock identifiers more granular by incorporating attribute values (from uniqueness constraints) into lock names, which enables concurrent mutations of objects with different attribute values. This core objective drives all the significant changes including the new lock_getter module, updates to node creation and constraint validation flows, GraphQL mutation refactoring, and lock reentrancy support. The title is concise, clear, and specific enough that a developer reviewing the git history would immediately understand the main architectural change, even though it contains a minor grammatical imprecision ("attributes" should be singular "attribute").
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lgu-lock-on-attributes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Jul 24, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Jul 24, 2025

CodSpeed Performance Report

Merging #6888 will not alter performance

Comparing lgu-lock-on-attributes (174797c) with stable (0b1a94a)1

Summary

✅ 10 untouched

Footnotes

  1. No successful run was found on stable (f08a4fa) during the generation of this report, so 0b1a94a was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@LucasG0 LucasG0 force-pushed the lgu-lock-on-attributes branch 2 times, most recently from 6206bb8 to 2628683 Compare July 24, 2025 13:38
@LucasG0 LucasG0 marked this pull request as ready for review July 24, 2025 13:38
@LucasG0 LucasG0 requested a review from a team as a code owner July 24, 2025 13:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
backend/infrahub/core/node/save.py (1)

10-26: Add documentation and consider error handling.

The implementation looks good, but consider these improvements:

  1. Add a docstring explaining the function's purpose, parameters, and behavior
  2. Consider what happens if constraint checks fail - will the locks be properly released?

Add documentation:

 async def run_constraints_and_save(
     node: Node,
     node_constraint_runner: NodeConstraintRunner,
     fields_to_validate: list[str],
     fields_to_save: list[str],
     db: InfrahubDatabase,
     branch: Branch,
     skip_uniqueness_check: bool = False,
 ) -> None:
+    """Run constraint checks and save node within a locked context.
+    
+    This function acquires locks based on the node's schema and uniqueness constraints,
+    runs constraint validation, and saves the node to the database.
+    
+    Args:
+        node: The node to validate and save
+        node_constraint_runner: Runner for constraint checks
+        fields_to_validate: Fields to validate during constraint checking
+        fields_to_save: Fields to save (empty list saves all fields)
+        db: Database connection
+        branch: Branch context
+        skip_uniqueness_check: Whether to skip uniqueness constraint checks
+    
+    Raises:
+        Any exceptions from constraint checks or save operations
+    """
     schema_branch = db.schema.get_schema_branch(name=branch.name)
backend/infrahub/graphql/mutations/ipam.py (1)

122-131: Consider reducing code duplication in reconciliation calls.

The if/else pattern for conditional locking appears multiple times with nearly identical reconciliation calls. Consider extracting this pattern to reduce duplication.

Consider creating a helper method to encapsulate the conditional locking pattern:

+    @classmethod
+    async def _reconcile_with_optional_lock(
+        cls,
+        reconciler: IpamReconciler,
+        ip_value: IPv4Interface | ipaddress.IPv6Interface | ipaddress.IPv4Network | ipaddress.IPv6Network,
+        namespace: str,
+        node_uuid: str,
+        lock_name: str | None = None,
+        is_delete: bool = False,
+    ) -> Node:
+        """Run reconciliation with optional locking."""
+        reconcile_kwargs = {
+            "ip_value": ip_value,
+            "namespace": namespace,
+            "node_uuid": node_uuid,
+        }
+        if hasattr(reconciler.reconcile, 'is_delete'):
+            reconcile_kwargs["is_delete"] = is_delete
+            
+        if lock_name:
+            async with InfrahubMultiLock(lock_registry=lock.registry, locks=[lock_name]):
+                return await reconciler.reconcile(**reconcile_kwargs)
+        else:
+            return await reconciler.reconcile(**reconcile_kwargs)

     # Then use it in the methods:
-        if lock_name := cls._get_lock_name(namespace_id, branch):
-            async with InfrahubMultiLock(lock_registry=lock.registry, locks=[lock_name]):
-                reconciled_address = await reconciler.reconcile(
-                    ip_value=ip_address, namespace=namespace_id, node_uuid=address.get_id()
-                )
-        else:
-            reconciled_address = await reconciler.reconcile(
-                ip_value=ip_address, namespace=namespace_id, node_uuid=address.get_id()
-            )
+        lock_name = cls._get_lock_name(namespace_id, branch)
+        reconciled_address = await cls._reconcile_with_optional_lock(
+            reconciler=reconciler,
+            ip_value=ip_address,
+            namespace=namespace_id,
+            node_uuid=address.get_id(),
+            lock_name=lock_name,
+        )

Also applies to: 168-177

backend/infrahub/lock_utils.py (4)

47-47: Fix docstring to reference the correct constant name

The docstring references KINDS_TO_LOCK_ON_ANY_BRANCH but the actual constant is KINDS_CONCURRENT_MUTATIONS_NOT_ALLOWED.

-    Check whether kind or any kind generic is in KINDS_TO_LOCK_ON_ANY_BRANCH.
+    Check whether kind or any kind generic is in KINDS_CONCURRENT_MUTATIONS_NOT_ALLOWED.

16-16: Clarify the docstring about constraint override behavior

The docstring is misleading. It states "we only need to lock on the generic" but the implementation removes the node schema kind from the list while keeping both the node kind and generic kinds when appropriate.

-    it means node schema overrided this constraint, in which case we only need to lock on the generic.
+    it means node schema overrode this constraint, in which case we exclude the node schema kind from locking.

23-41: Potential issue with list manipulation logic

The use of pop(0) on line 40 assumes that kinds[0] is always node_schema.kind, but this is only guaranteed if node_schema.uniqueness_constraints is truthy (lines 23-25). If the logic changes in the future, this could lead to removing the wrong item.

Consider using explicit removal instead:

-                kinds.pop(0)
+                if node_schema.kind in kinds:
+                    kinds.remove(node_schema.kind)

66-66: Fix grammatical error in docstring

-    concurrent mutations are only allowed on non-main branch as objects validations will be performed at least when merging in main branch.
+    concurrent mutations are only allowed on non-main branch as object validations will be performed at least when merging in main branch.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99495f7 and 2628683.

📒 Files selected for processing (6)
  • backend/infrahub/core/node/create.py (2 hunks)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/graphql/mutations/ipam.py (8 hunks)
  • backend/infrahub/graphql/mutations/main.py (4 hunks)
  • backend/infrahub/lock_utils.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/core/node/create.py
  • backend/infrahub/core/node/save.py
  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/graphql/mutations/main.py
  • backend/infrahub/graphql/mutations/ipam.py
  • backend/infrahub/lock_utils.py
backend/tests/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
PR: opsmill/infrahub#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-22T08:13:56.198Z
Learning: Backend implements the core graph-based infrastructure datastore, API, and business logic in `backend/infrahub/`
Learnt from: CR
PR: opsmill/infrahub#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-22T08:13:56.198Z
Learning: Infrahub offers a central hub to manage the data, templates and playbooks that powers your infrastructure by combining the version control and branch management capabilities similar to Git with the flexible data model and UI of a graph database
🔇 Additional comments (6)
backend/infrahub/core/node/create.py (1)

156-163: Good consolidation of constraint checking and saving logic.

The refactoring to use run_constraints_and_save improves code maintainability by encapsulating locking, constraint validation, and saving into a single atomic operation. This ensures better data integrity during node creation.

backend/tests/unit/core/test_get_kinds_lock.py (1)

39-83: Excellent test coverage for the new locking behavior.

The new tests comprehensively cover:

  • Branch-specific locking behavior (test_lock_other_branch)
  • Cross-branch locking requirements (test_lock_groups_on_other_branches)
  • Attribute-based uniqueness constraint lock names (test_lock_names_only_attributes)

The test structure is clear and covers important edge cases for the refactored locking logic.

backend/infrahub/graphql/mutations/ipam.py (2)

104-109: Good consolidation of lock name logic.

The _get_lock_name method properly encapsulates the logic for determining when locking is needed, returning None for non-default branches to avoid unnecessary locking.


259-264: Clear documentation of IP prefix locking requirements.

The comment explains why IP prefix mutations need locking on all branches due to cardinality-one relationships, which is helpful for understanding the different locking strategy compared to IP addresses.

backend/infrahub/graphql/mutations/main.py (2)

268-276: Excellent refactoring to use centralized constraint and save logic.

The replacement of separate constraint checking and saving with run_constraints_and_save improves code consistency and ensures that locking is properly handled at the core layer rather than in the GraphQL mutation layer.


146-151: Good removal of locking logic from wrapper methods.

The simplified wrapper methods now focus solely on their primary responsibility without concerning themselves with locking details. This improves separation of concerns and makes the code more maintainable.

Also applies to: 207-223

Copy link
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add a PR description that explains why the changes in _get_kind_lock_names_on_object_mutation will help with the locking slowness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually utils means that a file contains a bunch of things that don't really belong together and I don't think that is the case here. I think this file can be locks.py or even lock_getter.py

return False


def _get_lock_names_on_object_mutation(node: Node, branch: Branch, schema_branch: SchemaBranch) -> list[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least one of these should be a public method and I think it's this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@fatih-acar
Copy link
Contributor

fatih-acar commented Jul 24, 2025

Quick remark: we should be very careful with user inputs, in this case you are lucky that Redis keys are very permissive regarding character set. I'm not sure it would be the case with all implementations.

We can instead of using the attribute value directly, use some kind of hex hash of it.

Edit: not saying we should do it right now, I don't even know performance implications of such a change.

@LucasG0
Copy link
Contributor Author

LucasG0 commented Jul 24, 2025

@fatih-acar good point, I'll update to use the hash, I don't expect having performances issues on that, hashing a 1 million length string takes less than 1ms.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
backend/infrahub/lock_getter.py (1)

63-109: ⚠️ Use a stable hash function instead of Python's built-in hash()

The implementation correctly uses hashing as discussed in the PR comments. However, Python's built-in hash() function is not stable across Python processes due to hash randomization. This could lead to different lock names for the same data across different service instances or restarts.

Consider using a cryptographic hash function like SHA-256 or a non-cryptographic but stable hash like xxhash.

+import hashlib
+
 def get_lock_names_on_object_mutation(node: Node, branch: Branch, schema_branch: SchemaBranch) -> list[str]:
     """
     Return lock names for object on which we want to avoid concurrent mutation (create/update). Except for some specific kinds,
     concurrent mutations are only allowed on non-main branch as objects validations will be performed at least when merging in main branch.
     Lock names include kind, some generic kinds, and values of attributes of corresponding uniqueness constraints.
     """
 
     if not branch.is_default and not _should_kind_be_locked_on_any_branch(node.get_kind(), schema_branch):
         return []
 
     lock_kinds = _get_kinds_to_lock_on_object_mutation(node.get_kind(), schema_branch)
     lock_names = []
     for kind in lock_kinds:
         schema = schema_branch.get(name=kind)
         ucs = schema.uniqueness_constraints
         if ucs is None:
             continue
 
         ucs_lock_names = []
         uc_attributes_names = set()
 
         for uc in ucs:
             uc_attributes_values = []
             # Keep only attributes constraints
             for field_path in uc:
                 # Some attributes may exist in different uniqueness constraints, we de-duplicate them
                 if field_path in uc_attributes_names:
                     continue
 
                 # Exclude relationships uniqueness constraints
                 schema_path = schema.parse_schema_path(path=field_path, schema=schema_branch)
                 if schema_path.related_schema is not None or schema_path.attribute_schema is None:
                     continue
 
                 uc_attributes_names.add(field_path)
-                value_hashed = str(hash(str(getattr(node, schema_path.attribute_schema.name).value)))
+                value_str = str(getattr(node, schema_path.attribute_schema.name).value)
+                value_hashed = hashlib.sha256(value_str.encode()).hexdigest()[:16]  # Use first 16 chars for brevity
                 uc_attributes_values.append(value_hashed)
 
             if uc_attributes_values:
                 uc_lock_name = ".".join(uc_attributes_values)
                 ucs_lock_names.append(uc_lock_name)
 
         partial_lock_name = kind + "." + ".".join(ucs_lock_names)
         lock_names.append(build_object_lock_name(partial_lock_name))
 
     return lock_names
🧹 Nitpick comments (2)
backend/tests/unit/core/test_get_kinds_lock.py (1)

64-83: Verify the exclusion of owner relationship from lock name

The uniqueness constraint includes three paths: ["name__value", "color__value", "owner__name"], but the expected lock name only includes hashes for "mercedes" and "blue", excluding the owner relationship.

Based on the PR objectives stating "this approach is not applied to relationships", this appears intentional. However, consider adding a comment in the test to clarify this behavior for future maintainers.

 car_person_schema_unregistered.nodes[0].uniqueness_constraints = [
     ["name__value", "color__value", "owner__name"]
 ]
 registry.schema.register_schema(schema=car_person_schema_unregistered, branch=default_branch.name)
 
 schema_branch = registry.schema.get_schema_branch(name=default_branch.name)
 person = await create_and_save(db=db, schema="TestPerson", name="John")
 car = await create_and_save(db=db, schema="TestCar", name="mercedes", color="blue", owner=person)
+# Note: owner__name is excluded from lock names as relationships are not included in lock calculations
 assert get_lock_names_on_object_mutation(car, branch=default_branch, schema_branch=schema_branch) == [
     "global.object.TestCar." + str(hash("mercedes")) + "." + str(hash("blue"))
 ]
backend/infrahub/lock_getter.py (1)

11-43: Consider refactoring the list modification logic for better clarity

The function logic is correct, but using pop(0) while building the list could be clearer. Consider building the final list without modification.

 def _get_kinds_to_lock_on_object_mutation(kind: str, schema_branch: SchemaBranch) -> list[str]:
     """
     Return kinds for which we want to lock during creating / updating an object of a given schema node.
     Lock should be performed on schema kind and its generics having a uniqueness_constraint defined.
     If a generic uniqueness constraint is the same as the node schema one,
     it means node schema overrided this constraint, in which case we only need to lock on the generic.
     """
 
     node_schema = schema_branch.get(name=kind)
 
     schema_uc = None
-    kinds = []
+    include_node_kind = False
     if node_schema.uniqueness_constraints:
-        kinds.append(node_schema.kind)
+        include_node_kind = True
         schema_uc = node_schema.uniqueness_constraints
 
     if isinstance(node_schema, GenericSchema):
-        return kinds
+        return [node_schema.kind] if include_node_kind else []
 
     generics_kinds = node_schema.inherit_from
 
-    node_schema_kind_removed = False
+    kinds = []
     for generic_kind in generics_kinds:
         generic_uc = schema_branch.get(name=generic_kind).uniqueness_constraints
         if generic_uc:
             kinds.append(generic_kind)
-            if not node_schema_kind_removed and generic_uc == schema_uc:
+            if include_node_kind and generic_uc == schema_uc:
                 # Check whether we should remove original schema kind as it simply overrides uniqueness_constraint
                 # of a generic
-                kinds.pop(0)
-                node_schema_kind_removed = True
+                include_node_kind = False
+    
+    if include_node_kind:
+        kinds.insert(0, node_schema.kind)
+    
     return kinds
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2628683 and f488755.

📒 Files selected for processing (3)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/infrahub/core/node/save.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/tests/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
backend/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/lock_getter.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E-testing-version-upgrade / From 1.1.0
  • GitHub Check: backend-benchmark
  • GitHub Check: backend-docker-integration
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: validate-generated-documentation
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-tests-integration
  • GitHub Check: backend-tests-unit
🔇 Additional comments (6)
backend/tests/unit/core/test_get_kinds_lock.py (4)

1-12: Imports look good!

The new imports are appropriate for testing the refactored lock functionality.


16-38: Test coverage for kind locking logic is comprehensive!

The test properly covers different scenarios: inherited uniqueness constraints, generic relationships, and cases with no constraints.


39-51: Good test for branch-specific locking behavior!

This test correctly verifies that objects on non-default branches return empty lock names, which is the expected behavior.


52-63: No changes required: CoreGraphQLQueryGroup is correctly locked via inheritance

The CoreGraphQLQueryGroup schema definition inherits from InfrahubKind.GENERICGROUP, so _should_kind_be_locked_on_any_branch("CoreGraphQLQueryGroup", …) rightly returns True.
No update needed here.

backend/infrahub/lock_getter.py (2)

1-9: Imports and constant definition look good!

The constant clearly identifies kinds that require special locking behavior across all branches.


45-61: Clean implementation for branch-agnostic locking check!

The function efficiently checks both the kind and its inherited kinds.

@LucasG0 LucasG0 requested a review from ajtmccarty July 24, 2025 16:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/infrahub/lock_getter.py (2)

37-41: Consider using a more explicit approach for removing the node schema kind.

The use of kinds.pop(0) assumes that the node schema kind is always at index 0. While this is currently true due to the order of operations, it makes the code fragile and dependent on ordering.

Consider storing the node schema kind separately and removing it explicitly:

     node_schema_kind_removed = False
     for generic_kind in generics_kinds:
         generic_uc = schema_branch.get(name=generic_kind).uniqueness_constraints
         if generic_uc:
             kinds.append(generic_kind)
             if not node_schema_kind_removed and generic_uc == schema_uc:
                 # Check whether we should remove original schema kind as it simply overrides uniqueness_constraint
                 # of a generic
-                kinds.pop(0)
+                if node_schema.kind in kinds:
+                    kinds.remove(node_schema.kind)
                 node_schema_kind_removed = True

112-113: Handle edge case when all uniqueness constraints are relationship-based.

When all uniqueness constraints are relationship-based (and thus skipped), ucs_lock_names will be empty, resulting in lock names with consecutive dots like "kind..".

Consider handling this edge case:

-        partial_lock_name = kind + "." + ".".join(ucs_lock_names)
+        if ucs_lock_names:
+            partial_lock_name = kind + "." + ".".join(ucs_lock_names)
+        else:
+            partial_lock_name = kind

This ensures consistent lock name formatting.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f488755 and 3a7793a.

📒 Files selected for processing (2)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/tests/unit/core/test_get_kinds_lock.py
🧰 Additional context used
📓 Path-based instructions (1)
backend/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/lock_getter.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E-testing-version-upgrade / From 1.1.0
  • GitHub Check: backend-docker-integration
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: validate-generated-documentation
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-benchmark
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: backend-tests-unit
  • GitHub Check: backend-tests-integration
🔇 Additional comments (2)
backend/infrahub/lock_getter.py (2)

1-9: LGTM!

Imports and constant definition are appropriate for the module's functionality.


45-61: LGTM!

The function correctly checks if a kind or its generics require locking on any branch.

@LucasG0 LucasG0 force-pushed the lgu-lock-on-attributes branch from e973da4 to 5f677ea Compare July 25, 2025 08:40
@LucasG0 LucasG0 requested a review from a team July 25, 2025 08:56
@fatih-acar
Copy link
Contributor

fatih-acar commented Jul 25, 2025

I think there's an issue with objects depending on a resource manager such as NumberPool... I end up with duplicate numbers allocated.

I think we still require to lock exclusively on kind when the object uses resource manager...

Edit: this issue is NOT related to #6866

@fatih-acar
Copy link
Contributor

I think there's an issue with objects depending on a resource manager such as NumberPool... I end up with duplicate numbers allocated.

I think we still require to lock exclusively on kind when the object uses resource manager...

Edit: this issue is NOT related to #6866

#7025 is not enough to fix the issue because the get_resource() call is done within a DB transaction when we're inside the Create/Upsert mutation path...
One solution to fix this would be to apply the same filter we do on uniqueness constraints, but also on attributes that depend on a resource manager.

@LucasG0 LucasG0 closed this Sep 17, 2025
@LucasG0 LucasG0 reopened this Sep 17, 2025
@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch 2 times, most recently from 80f493f to 40b213b Compare September 19, 2025 08:42
@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch 3 times, most recently from b8a059b to 1c1a372 Compare September 30, 2025 12:34
@fatih-acar
Copy link
Contributor

Fixes #7254

@fatih-acar
Copy link
Contributor

We can probably re-enable locking within branches now that they are more granular?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 746a0d1 and 537cc7f.

📒 Files selected for processing (3)
  • backend/infrahub/graphql/mutations/ipam.py (10 hunks)
  • backend/infrahub/lock.py (3 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/lock_getter.py
  • backend/infrahub/lock.py
  • backend/infrahub/graphql/mutations/ipam.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

**/*.py: Use type hints for all function parameters and return values in Python code
Prefer asynchronous implementations when feasible (use Async whenever possible)
Define asynchronous functions with async def
Use await for asynchronous calls
Use Pydantic models instead of standard library dataclasses
All Python functions must include a docstring
Use Ruff and MyPy for linting and type checking

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Use async def for asynchronous functions
Use await for asynchronous calls within async functions
Use Pydantic models instead of dataclasses for data modeling
All Python functions must have docstrings
Docstrings must use triple quotes (""")
Follow Google-style docstring format
Docstrings should include, when applicable: brief description, detailed description, Args/Parameters (without types), Returns, Raises, and Examples
Validate and lint Python with ruff and mypy
Format Python code using the project’s formatter (invoke format)

Files:

  • backend/infrahub/lock_getter.py
  • backend/infrahub/lock.py
  • backend/infrahub/graphql/mutations/ipam.py
🧬 Code graph analysis (3)
backend/infrahub/lock_getter.py (4)
backend/infrahub/core/schema/schema_branch.py (1)
  • SchemaBranch (72-2314)
backend/infrahub/lock.py (2)
  • build_object_lock_name (283-284)
  • get (254-260)
backend/infrahub/core/relationship/model.py (2)
  • RelationshipManager (718-1242)
  • pop (678-684)
backend/infrahub/core/schema/basenode_schema.py (1)
  • parse_schema_path (426-474)
backend/infrahub/lock.py (1)
backend/infrahub/core/timestamp.py (1)
  • current_timestamp (33-34)
backend/infrahub/graphql/mutations/ipam.py (5)
backend/infrahub/core/node/create.py (1)
  • get_profile_ids (116-120)
backend/infrahub/lock.py (2)
  • InfrahubMultiLock (50-74)
  • build_object_lock_name (283-284)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (54-125)
backend/infrahub/core/node/__init__.py (2)
  • Node (72-1062)
  • from_graphql (966-994)
backend/infrahub/graphql/mutations/main.py (1)
  • mutate_update_object (282-356)
🪛 GitHub Actions: CI
backend/infrahub/lock_getter.py

[error] 54-54: Ruff: ARG001 Unused function argument: branch.

🔇 Additional comments (11)
backend/infrahub/lock.py (4)

8-8: LGTM!

The ContextVar import is necessary for implementing re-entrant locking semantics.


136-136: LGTM!

The ContextVar initialization correctly provides per-context recursion tracking with a unique name to avoid collisions between different lock instances.


159-172: LGTM!

The re-entrant acquire logic is correct: it increments the recursion depth if the lock is already held in the current context and returns early without performing a duplicate acquisition. The outermost acquire initializes the depth to 1 and records metrics appropriately.


174-194: LGTM!

The re-entrant release logic correctly handles:

  • Error case: raises RuntimeError if releasing without ownership
  • Nested releases: decrements depth without releasing the underlying lock
  • Final release: records metrics, releases the underlying lock, and clears the recursion context

The implementation ensures that metrics are only recorded for the outermost acquisition/release pair.

backend/infrahub/lock_getter.py (3)

14-45: LGTM!

The function correctly identifies kinds requiring locks by examining uniqueness constraints on both the node schema and its generics. The logic to remove the node kind when it simply overrides a generic's constraint (lines 40-44) prevents duplicate locking and is well-documented.


48-51: LGTM!

The use of SHA-256 ensures consistent lock names across different processes, which is essential for distributed locking. The comment clearly explains why Python's built-in hash() is unsuitable.


81-125: LGTM!

The uniqueness constraint processing correctly:

  • Deduplicates attributes across multiple constraints (lines 96-97)
  • Excludes relationship constraints as intended per PR objectives (lines 100-102)
  • Handles optional/None attributes gracefully (lines 106-109)
  • Skips kinds with no qualifying attributes to prevent malformed lock names (lines 119-120)
  • Builds hierarchical lock names from hashed attribute values (lines 122-123)

The implementation addresses previous review feedback and handles edge cases appropriately.

backend/infrahub/graphql/mutations/ipam.py (4)

131-152: LGTM, pending signature fix.

The refactored mutate_create correctly:

  • Applies @retry_db_transaction decorator for resilience
  • Acquires locks before opening the transaction (lines 145-146)
  • Performs reconciliation within the transaction scope (lines 147-149)

The structure is sound once the signature inconsistency on line 144 is resolved.


292-332: LGTM!

The IPPrefix mutation correctly:

  • Documents the cardinality-one relationship locking requirement (lines 293-295)
  • Acquires namespace-level lock before the transaction (line 325)
  • Performs reconciliation within the transaction scope (lines 327-329)
  • Applies @retry_db_transaction decorator for resilience

The implementation is consistent and well-structured.


366-484: LGTM!

Both mutate_update and mutate_delete correctly implement the centralized locking pattern:

  • mutate_update: nested locks (namespace → object) wrap the transaction and reconciliation (lines 402-417)
  • mutate_delete: namespace lock wraps the transaction and reconciliation (lines 477-481)

The lock acquisition and transaction boundaries are properly aligned, ensuring data consistency during concurrent operations.


189-242: Ensure _get_lock_names(namespace_id, branch) signature matches its call. Lock ordering across IPAM update mutations is consistent (namespace → object → transaction).

@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch 4 times, most recently from 7b6f018 to 3aa9ba8 Compare September 30, 2025 20:24
Copy link
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions and comments (mostly on things you did not write)
and a couple more questions

  1. is it possible to add any sort of graphql-level tests that actually cover the locks as they will be used? seems like it would be hard, but maybe it's possible by creating/updating overlapping objects and checking timestamps of the changes or something
  2. is this change big enough to be on develop instead of stable?

Comment on lines 54 to 57
"""
Return lock names for object on which we want to avoid concurrent mutation (create/update). Except for some specific kinds,
concurrent mutations are only allowed on non-main branch as objects validations will be performed at least when merging in main branch.
Lock names include kind, some generic kinds, and values of attributes of corresponding uniqueness constraints.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this is leftover from months ago, but I don't think this is accurate any longer
also, I don't think this docstring belongs here as it references logic way outside of its scope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, let me know what you think of the new version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this all to support the "re-entrant" locks you mentioned in the description? meaning a lock that can be acquired multiple times?
how does this prevent different workers from acquiring the same lock if the lock can be acquired more than once?

Copy link
Contributor

@fatih-acar fatih-acar Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly
We track this behavior using a counter within a Contextvar: this is to have a asyncio context scoped counter (this is similar to TLS (thread local storage) but for asyncio coroutines, see https://peps.python.org/pep-0567/)

Different worker is necessarily within another coroutine, so in this case since the refcount is 0, it will try to acquire the global lock, which will be held by another worker.
A different coroutine within the same worker will also try to acquire the lock (in case of a concurrent graphql mutation running on the same worker for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal is just to fix the following call stack:
Create mutation holds resource pool lock -> create new node internally using Node.new() -> triggers resource pool get_resource() -> get_resource locks again using the same resource pool id -> deadlock

We introduced get_resource locks again using the same resource pool id in #7025 to lock direct mutation calls to the resource pools...
A cleaner solution would be to move the lock from the get_resource call up to the different top-level mutation handlers.

Comment on lines +214 to +215
fields_to_validate = list(data)
fields = list(data.keys())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe fields_to_validate is used for the constraints checker, this is what I understand from the previous state of the code. So yes they are different and required?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 537cc7f and 9eefa2f.

📒 Files selected for processing (6)
  • backend/infrahub/core/node/create.py (4 hunks)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/graphql/mutations/ipam.py (10 hunks)
  • backend/infrahub/graphql/mutations/main.py (5 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/infrahub/lock_getter.py
  • backend/infrahub/core/node/save.py
🧰 Additional context used
📓 Path-based instructions (3)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/core/node/create.py
  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/graphql/mutations/ipam.py
  • backend/infrahub/graphql/mutations/main.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

**/*.py: Use type hints for all function parameters and return values in Python code
Prefer asynchronous implementations when feasible (use Async whenever possible)
Define asynchronous functions with async def
Use await for asynchronous calls
Use Pydantic models instead of standard library dataclasses
All Python functions must include a docstring
Use Ruff and MyPy for linting and type checking

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Use async def for asynchronous functions
Use await for asynchronous calls within async functions
Use Pydantic models instead of dataclasses for data modeling
All Python functions must have docstrings
Docstrings must use triple quotes (""")
Follow Google-style docstring format
Docstrings should include, when applicable: brief description, detailed description, Args/Parameters (without types), Returns, Raises, and Examples
Validate and lint Python with ruff and mypy
Format Python code using the project’s formatter (invoke format)

Files:

  • backend/infrahub/core/node/create.py
  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/graphql/mutations/ipam.py
  • backend/infrahub/graphql/mutations/main.py
backend/tests/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
🧬 Code graph analysis (4)
backend/infrahub/core/node/create.py (6)
backend/infrahub/core/constraint/node/runner.py (1)
  • NodeConstraintRunner (13-43)
backend/infrahub/core/node/__init__.py (6)
  • Node (72-1062)
  • save (814-824)
  • init (213-219)
  • init (223-229)
  • init (232-258)
  • new (697-716)
backend/infrahub/core/node/save.py (1)
  • run_constraints_and_save (12-57)
backend/infrahub/dependencies/registry.py (1)
  • get_component_registry (61-62)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (50-74)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (53-118)
backend/tests/unit/core/test_get_kinds_lock.py (3)
backend/infrahub/lock_getter.py (3)
  • _get_kinds_to_lock_on_object_mutation (13-44)
  • _hash (47-50)
  • get_lock_names_on_object_mutation (53-118)
backend/tests/node_creation.py (1)
  • create_and_save (8-12)
backend/tests/conftest.py (1)
  • car_person_schema_unregistered (497-563)
backend/infrahub/graphql/mutations/ipam.py (5)
backend/infrahub/core/node/create.py (1)
  • get_profile_ids (116-120)
backend/infrahub/database/__init__.py (3)
  • InfrahubDatabase (141-458)
  • start_transaction (241-254)
  • get_schema_branch (130-138)
backend/infrahub/lock.py (2)
  • InfrahubMultiLock (50-74)
  • build_object_lock_name (283-284)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (53-118)
backend/infrahub/graphql/mutations/main.py (3)
  • mutate_create_object (173-190)
  • mutate_update_object (280-354)
  • mutate_update_to_graphql (357-368)
backend/infrahub/graphql/mutations/main.py (5)
backend/infrahub/core/node/__init__.py (4)
  • save (814-824)
  • get_kind (89-91)
  • from_graphql (966-994)
  • Node (72-1062)
backend/infrahub/core/node/save.py (1)
  • run_constraints_and_save (12-57)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (50-74)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (53-118)
backend/infrahub/core/node/create.py (2)
  • get_profile_ids (116-120)
  • refresh_for_profile_update (123-144)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E-testing-version-upgrade / From 1.3.0
  • GitHub Check: validate-generated-documentation
  • GitHub Check: backend-benchmark
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: backend-docker-integration
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: backend-tests-integration
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-tests-unit
🔇 Additional comments (23)
backend/infrahub/core/node/create.py (3)

3-15: LGTM: Import additions support the new locking flow.

The new imports (Sequence, run_constraints_and_save, InfrahubMultiLock, get_lock_names_on_object_mutation) are necessary for the centralized constraint validation and multi-lock implementation.


193-220: Well-designed two-phase creation flow with preview and persistence.

The refactor correctly separates lock computation (preview phase with process_pools=False to avoid premature pool allocation) from actual persistence (fresh node initialization within multi-lock context). This ensures locks are computed based on final values without race conditions.


155-164: No change needed: passing fields_to_save=[] is correct for node creation. The save() method calls _create when _existing is false—ignoring the fields parameter and persisting all fields—so an empty list has no impact on creation.

Likely an incorrect or invalid review comment.

backend/infrahub/graphql/mutations/main.py (6)

32-34: LGTM!

The imports correctly introduce the centralized constraint-and-save helper, multi-lock support, and lock name derivation for the new mutation flow.


222-231: LGTM!

The preview object pattern correctly computes locks without triggering pool allocations, aligning with the PR's goal to prevent duplicate resource allocations during lock computation.


233-256: LGTM!

The nested function pattern cleanly scopes the mutation within the multi-lock context, correctly handling both transactional and non-transactional database connections.


280-354: LGTM!

The function signature is well-extended with comprehensive documentation. The delegation to run_constraints_and_save correctly centralizes constraint validation and persistence logic, and profile refresh is correctly handled.


514-515: LGTM!

The helper correctly extracts field keys excluding immutable identifiers.


214-219: Ensure fields_to_validate excludes id and hfid
Remove id and hfid from the list of fields passed to validation (currently only stripped from the save list) or clarify why immutable identifiers should still run through constraints.

backend/tests/unit/core/test_get_kinds_lock.py (3)

1-12: LGTM!

The imports correctly introduce the necessary utilities for testing lock name derivation with the new centralized locking approach.


39-52: LGTM!

The test correctly validates branch-aware lock name generation, ensuring locks are properly scoped to the branch context.


54-72: LGTM!

The test correctly validates that lock names include only attribute values from uniqueness constraints, excluding relationships as intended by the lock derivation logic.

backend/infrahub/graphql/mutations/ipam.py (11)

15-22: LGTM!

The imports correctly introduce profile ID tracking and lock name derivation for the centralized locking flow.


110-111: LGTM!

The helper correctly derives a namespace-scoped lock name for IPADDRESS mutations.


132-152: LGTM!

The create flow correctly acquires namespace-scoped locks before performing transactional creation and reconciliation.


155-186: LGTM!

The helper correctly propagates lock names and field data to mutate_update_object, with manage_lock=False and apply_data=False delegating lock and data management to the caller.


190-242: LGTM!

The update flow correctly captures profile and field data, derives lock names, and applies nested multi-lock contexts around the transactional update and reconciliation.


292-296: LGTM!

The helper correctly derives a namespace-scoped lock name for IPPREFIX mutations, with a clear comment explaining the rationale for branch-wide locking.


313-332: LGTM!

The create flow follows the correct pattern of namespace-scoped locking, transactional creation, and reconciliation.


335-363: LGTM!

The helper correctly propagates lock names and field data, delegating lock and data management to the caller.


367-419: LGTM!

The update flow follows the consistent pattern of nested multi-lock contexts (namespace, then object), ensuring proper lock ordering and transactional integrity.


458-484: LGTM!

The delete flow correctly acquires namespace-scoped locks before performing transactional reconciliation of the deletion.


224-227: Lock ordering confirmed consistent

Both IPADDRESS and IPPREFIX update mutations use nested InfrahubMultiLock contexts that acquire namespace locks before object-specific locks; create/reconcile flows only lock the namespace as intended.

Comment on lines +74 to +89
async def test_lock_names_optional_empty_attribute(
self,
db: InfrahubDatabase,
default_branch,
client,
car_person_schema_unregistered,
):
car_person_schema_unregistered = deepcopy(car_person_schema_unregistered)
car_person_schema_unregistered.nodes[1].uniqueness_constraints = [["height__value"]]
registry.schema.register_schema(schema=car_person_schema_unregistered, branch=default_branch.name)

# Test lock onanother branch
other_branch = await create_branch(branch_name="other_branch", db=db)
with patch("infrahub.graphql.mutations.main.InfrahubMultiLock") as mock_infrahub_multi_lock:
group = await client.create(
kind=GRAPHQLQUERYGROUP, name="one_more_group", query=graphql_query, branch=other_branch.name
)
await group.save()
mock_infrahub_multi_lock.assert_called_once_with(
lock_registry=lock.registry, locks=["global.object.CoreGraphQLQueryGroup." + _hash("one_more_group")]
)
schema_branch = registry.schema.get_schema_branch(name=default_branch.name)
person = await create_and_save(db=db, schema="TestPerson", name="John")
assert get_lock_names_on_object_mutation(person, schema_branch=schema_branch) == [
"global.object.TestPerson." + _hash("") + "." + _hash("John")
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the expected lock name in the test assertion.

The test sets uniqueness_constraints = [["height__value"]] (line 82), which suggests a single constraint on the height attribute. However, the assertion (line 88) expects the lock name to include two hashes: _hash("") (for height) and _hash("John") (for name). This would correspond to either:

  1. Two separate uniqueness constraints: [["height__value"], ["name__value"]].
  2. A single constraint with two attributes: [["height__value", "name__value"]].

If the name attribute's unique=True setting implicitly adds a uniqueness constraint, this would explain the assertion. Please verify whether:

  • Attributes marked unique=True automatically create implicit uniqueness constraints.
  • The test assertion correctly reflects the expected lock name format.

🏁 Script executed:

#!/bin/bash
# Description: Check if unique=True on attributes creates implicit uniqueness constraints.
# Search for schema initialization logic that processes unique attributes.

rg -nP -C5 '(\bunique\s*=\s*True|\buniqueness_constraints\b)' backend/infrahub/core/schema/

Length of output: 46611


🏁 Script executed:

#!/bin/bash
sed -n '680,760p' backend/infrahub/core/schema/schema_branch.py

Length of output: 3909


🏁 Script executed:

#!/bin/bash
rg -nP 'register_schema' -C3 backend/infrahub/core/schema

Length of output: 633


🏁 Script executed:

#!/bin/bash
sed -n '160,260p' backend/infrahub/core/schema/manager.py

Length of output: 4159


🏁 Script executed:

rg -nP 'def process' -n -C20 backend/infrahub/core/schema/schema_branch.py

Length of output: 20761


Update test assertion to match two separate lock names.

After schema registration, height and the implicit unique name attribute each become their own uniqueness constraint, so get_lock_names_on_object_mutation returns two locks, not one combined value. On lines 88–89 of backend/tests/unit/core/test_get_kinds_lock.py, change the assertion to:

assert get_lock_names_on_object_mutation(person, schema_branch=schema_branch) == [
    "global.object.TestPerson." + _hash(""),
    "global.object.TestPerson." + _hash("John"),
]
🤖 Prompt for AI Agents
In backend/tests/unit/core/test_get_kinds_lock.py around lines 74 to 89, the
test currently asserts a single lock name but after schema registration there
are two separate uniqueness constraints (implicit name and height), so update
the assertion on lines 88–89 to expect two lock strings: the first for the empty
attribute value (use "global.object.TestPerson." + _hash("") ) and the second
for the object's name (use "global.object.TestPerson." + _hash("John") );
replace the existing single-item list with a two-item list in that order.

@fatih-acar
Copy link
Contributor

Thanks for reviewing @ajtmccarty

some questions and comments (mostly on things you did not write) and a couple more questions

  1. is it possible to add any sort of graphql-level tests that actually cover the locks as they will be used? seems like it would be hard, but maybe it's possible by creating/updating overlapping objects and checking timestamps of the changes or something
  2. is this change big enough to be on develop instead of stable?
  1. I think yes, but it would be a bit hard as you wonder. It's pretty easy to stress test this using our internal scale testing suite since it massively creates objects concurrently (this is how I caught most of the issues that are resolved in this PR).
  2. I think this should go in develop for the 1.6 release, but we might need to cherry-pick some changes to fix bug: number resource pools can allocate the same resource when allocation requests run in parallel #7254 in stable...

fatih-acar and others added 17 commits October 28, 2025 16:04
We should also lock regardless of the branch when allocating from a
resource manager.

Signed-off-by: Fatih Acar <[email protected]>
BROKEN: this does not handle double lock/deadlock issues when we acquire
a resource pool lock in run_constraints_and_save() and then in
get_resource(). We probably require a mechanism to check if the lock is
actually held by the current worker, if so it can proceed without
acquiring the lock.

This reverts commit 31d7940.
Make locks re-entrant within the same asyncio context.

Signed-off-by: Fatih Acar <[email protected]>
WIP: still need to fix the uniqueness constraint lock on update.

Signed-off-by: Fatih Acar <[email protected]>
Fixes "Lock names include attributes values"

Signed-off-by: Fatih Acar <[email protected]>
Previous implementation was wrong due to asyncio context switching.

Signed-off-by: Fatih Acar <[email protected]>
We can enable locking for consistency on branches now that they are more
granular and avoid performance penalty.

Signed-off-by: Fatih Acar <[email protected]>
Signed-off-by: Fatih Acar <[email protected]>
Granular locking produces more cardinality for lock names thus bloating
prometheus metrics leading to high memory usage.
Completely disable metrics for these locks as they would be useless for
the moment.

Signed-off-by: Fatih Acar <[email protected]>
@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch from 56841a5 to 174797c Compare October 28, 2025 15:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
backend/infrahub/core/node/__init__.py (2)

260-271: Update docstring to document allocate_resources parameter.

The allocate_resources parameter was added but the docstring was not updated. Per coding guidelines, all parameters should be documented.

Apply this diff to update the docstring:

     async def handle_pool(
         self,
         db: InfrahubDatabase,
         attribute: BaseAttribute,
         errors: list,
         allocate_resources: bool = True,
     ) -> None:
-        """Evaluate if a resource has been requested from a pool and apply the resource
+        """Evaluate if a resource has been requested from a pool and apply the resource.
+
+        Args:
+            db: Database connection.
+            attribute: Attribute to evaluate for pool allocation.
+            errors: List to append validation errors.
+            allocate_resources: Whether to perform pool allocation; when False, pool allocation is skipped.
 
         This method only works on number pools, currently Integer is the only type that has the from_pool
         within the create code.
         """

As per coding guidelines


699-718: Add docstring to document process_pools parameter.

The new method lacks a docstring. Per coding guidelines, all public methods should have docstrings documenting their parameters.

Apply this diff to add a docstring:

     async def new(
         self,
         db: InfrahubDatabase,
         id: str | None = None,
         process_pools: bool = True,
         **kwargs: Any,
     ) -> Self:
+        """Create a new node instance with the provided data.
+
+        Args:
+            db: Database connection.
+            id: Optional UUID for the node; generated if not provided.
+            process_pools: Whether to perform resource pool allocations; when False, pool allocation is skipped.
+            **kwargs: Field values for the node.
+
+        Returns:
+            The initialized node instance.
+        """
         if id and not is_valid_uuid(id):

As per coding guidelines

backend/infrahub/core/node/save.py (1)

12-35: Add Raises section to docstring.

The function raises ValueError when manage_lock is False but lock_names is None (line 38), but this is not documented in the docstring.

Apply this diff:

     Args:
         node: The node instance to validate and persist.
         node_constraint_runner: Runner executing node-level constraints.
         fields_to_validate: Field names that must be validated.
         fields_to_save: Field names that must be persisted.
         db: Database connection or transaction to use for persistence.
         branch: Branch associated with the mutation.
         skip_uniqueness_check: Whether to skip uniqueness constraints.
         lock_names: Precomputed lock identifiers to reuse when ``manage_lock`` is False.
         manage_lock: Whether this helper should acquire and release locks itself.
+
+    Raises:
+        ValueError: When manage_lock is False but lock_names is not provided.
     """

As per coding guidelines

🧹 Nitpick comments (1)
backend/infrahub/graphql/mutations/main.py (1)

213-231: Verify that preview object correctly captures all lock-relevant state.

The preview object is created with process_pools=False to avoid triggering resource allocations during lock computation. Ensure that this approach correctly captures all attributes that contribute to lock names, including:

  • Attributes with uniqueness constraints
  • Resource pool references (even when not allocated)

The current implementation appears sound, but consider adding a comment explaining why process_pools=False is safe here.

Apply this diff to add clarification:

         # Prepare a clone to compute locks without triggering pool allocations
+        # process_pools=False prevents resource allocation but still captures pool references
+        # which are needed for lock name computation
         preview_obj = await NodeManager.get_one_by_id_or_default_filter(
             db=db,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56841a5 and 174797c.

📒 Files selected for processing (10)
  • backend/infrahub/core/attribute.py (2 hunks)
  • backend/infrahub/core/node/__init__.py (9 hunks)
  • backend/infrahub/core/node/create.py (4 hunks)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/graphql/mutations/ipam.py (10 hunks)
  • backend/infrahub/graphql/mutations/main.py (5 hunks)
  • backend/infrahub/lock.py (7 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
  • backend/tests/unit/test_lock.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/infrahub/lock_getter.py
  • backend/infrahub/core/attribute.py
🧰 Additional context used
📓 Path-based instructions (3)
backend/tests/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/test_lock.py
  • backend/tests/unit/core/test_get_kinds_lock.py
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/tests/unit/test_lock.py
  • backend/infrahub/core/node/create.py
  • backend/infrahub/core/node/save.py
  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/core/node/__init__.py
  • backend/infrahub/lock.py
  • backend/infrahub/graphql/mutations/main.py
  • backend/infrahub/graphql/mutations/ipam.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

  • backend/tests/unit/test_lock.py
  • backend/infrahub/core/node/create.py
  • backend/infrahub/core/node/save.py
  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/core/node/__init__.py
  • backend/infrahub/lock.py
  • backend/infrahub/graphql/mutations/main.py
  • backend/infrahub/graphql/mutations/ipam.py
🧬 Code graph analysis (8)
backend/tests/unit/test_lock.py (1)
backend/infrahub/lock.py (2)
  • initialize_lock (297-299)
  • get (266-279)
backend/infrahub/core/node/create.py (6)
backend/infrahub/core/constraint/node/runner.py (1)
  • NodeConstraintRunner (13-43)
backend/infrahub/core/node/__init__.py (6)
  • Node (72-1064)
  • save (816-826)
  • init (213-219)
  • init (223-229)
  • init (232-258)
  • new (699-718)
backend/infrahub/core/node/save.py (2)
  • run_constraints_and_save (12-57)
  • _persist (47-51)
backend/infrahub/dependencies/registry.py (1)
  • get_component_registry (61-62)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (50-77)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (53-118)
backend/infrahub/core/node/save.py (5)
backend/infrahub/core/constraint/node/runner.py (1)
  • NodeConstraintRunner (13-43)
backend/infrahub/core/node/__init__.py (2)
  • Node (72-1064)
  • save (816-826)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (50-77)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (53-118)
backend/infrahub/core/node/create.py (1)
  • _persist (218-232)
backend/tests/unit/core/test_get_kinds_lock.py (5)
backend/infrahub/lock_getter.py (3)
  • _get_kinds_to_lock_on_object_mutation (13-44)
  • _hash (47-50)
  • get_lock_names_on_object_mutation (53-118)
backend/tests/helpers/test_app.py (1)
  • TestInfrahubApp (64-222)
backend/tests/node_creation.py (1)
  • create_and_save (8-12)
backend/tests/conftest.py (1)
  • car_person_schema_unregistered (497-563)
backend/infrahub/core/schema/manager.py (1)
  • register_schema (202-209)
backend/infrahub/core/node/__init__.py (2)
backend/infrahub/core/protocols_base.py (3)
  • InfrahubDatabase (35-65)
  • new (84-84)
  • from_graphql (104-104)
backend/infrahub/core/attribute.py (1)
  • from_graphql (581-629)
backend/infrahub/lock.py (1)
backend/infrahub/core/timestamp.py (1)
  • current_timestamp (33-34)
backend/infrahub/graphql/mutations/main.py (5)
backend/infrahub/core/node/__init__.py (5)
  • save (816-826)
  • get_kind (89-91)
  • get_id (93-98)
  • from_graphql (968-996)
  • Node (72-1064)
backend/infrahub/core/node/save.py (1)
  • run_constraints_and_save (12-57)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (50-77)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (53-118)
backend/infrahub/core/node/create.py (2)
  • get_profile_ids (136-140)
  • refresh_for_profile_update (143-164)
backend/infrahub/graphql/mutations/ipam.py (5)
backend/infrahub/core/node/create.py (1)
  • get_profile_ids (136-140)
backend/infrahub/lock.py (2)
  • InfrahubMultiLock (50-77)
  • build_object_lock_name (302-303)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (53-118)
backend/infrahub/core/node/__init__.py (2)
  • Node (72-1064)
  • from_graphql (968-996)
backend/infrahub/graphql/mutations/main.py (1)
  • mutate_update_object (280-354)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: validate-generated-documentation
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (11)
backend/tests/unit/test_lock.py (1)

65-93: LGTM! Test validates reentrant lock behavior correctly.

The test properly verifies that:

  • Nested acquisitions within the same task succeed without blocking
  • A waiting task cannot acquire the lock until the outermost release
  • Release ordering is correct (inner before outer)
backend/infrahub/graphql/mutations/main.py (2)

233-256: LGTM! Lock orchestration follows correct two-phase pattern.

The implementation correctly:

  1. Computes lock names from the preview object
  2. Acquires locks before entering the transaction
  3. Passes manage_lock=False to avoid double-locking
  4. Handles both transactional and non-transactional database contexts

The lock ordering (acquire before transaction) prevents deadlocks between database locks and application locks.


295-313: LGTM! Docstring properly documents the extended signature.

The docstring clearly explains all new parameters including fields_to_validate, fields, previous_profile_ids, lock_names, manage_lock, and apply_data. This is helpful for understanding the flexible locking semantics.

As per coding guidelines

backend/infrahub/core/node/save.py (1)

37-57: LGTM! Lock management logic is correct.

The function properly:

  1. Validates preconditions (line 37-38)
  2. Derives lock names when needed (lines 40-45)
  3. Conditionally manages locks based on manage_lock flag (lines 53-57)
  4. Uses metrics=False to avoid double-counting when called from higher-level wrappers

This enables flexible reuse in both standalone and nested locking contexts.

backend/infrahub/core/node/create.py (2)

167-196: LGTM! Signature change improves separation of concerns.

The refactored _do_create_node now:

  1. Accepts a pre-initialized obj instead of creating it
  2. Delegates lock management to the caller via manage_lock=False
  3. Focuses on constraint validation and persistence

This design enables the two-phase creation pattern while keeping lock orchestration at the appropriate level.


213-240: Verified: Lock names are correctly computed regardless of process_pools=False.

The from_pool attribute is set from input data (line 121, 599 in attribute.py) independently of the process_pools flag. The flag only controls whether handle_pool() is invoked for pool allocation—not whether from_pool values are captured. Since get_lock_names_on_object_mutation() checks for the presence of from_pool attributes (which are populated during preview_obj.new()), the lock names will be correctly computed even with process_pools=False.

The two-phase approach is sound and the implementation is correct.

backend/infrahub/lock.py (2)

164-183: LGTM! Recursion tracking correctly implements reentrant semantics.

The ContextVar-based tracking ensures:

  1. Each async task has its own recursion depth counter (isolated context)
  2. Nested acquisitions increment the counter without blocking
  3. Root acquisition (depth=None) actually acquires the underlying lock
  4. Metrics are only recorded on root acquisition (when metrics=True)

This design correctly prevents deadlocks when the same task needs to acquire a lock it already holds.


185-206: LGTM! Release logic correctly implements nested unwinding.

The implementation properly:

  1. Raises RuntimeError if releasing without ownership (lines 187-188)
  2. Decrements depth for nested releases without releasing the underlying lock (lines 190-192)
  3. Records metrics and releases the underlying lock only on outermost release (lines 194-206)
  4. Resets the recursion counter to None after full release (line 205)

The reverse-order release in InfrahubMultiLock (line 76) complements this to ensure proper lock ordering.

backend/infrahub/graphql/mutations/ipam.py (2)

144-152: LGTM! Create mutation properly wraps reconciliation in lock scope.

The implementation correctly:

  1. Computes namespace lock names before acquiring locks
  2. Acquires locks before entering the transaction
  3. Performs creation and reconciliation within the locked transaction scope
  4. Converts GraphQL result within the transaction to ensure consistency

163-180: LGTM! Update helper properly delegates to parent with lock management disabled.

The _mutate_update_object_and_reconcile correctly:

  1. Passes all lock-related parameters to parent's mutate_update_object
  2. Sets manage_lock=False to avoid double-locking (line 178)
  3. Sets apply_data=False since data was already applied by the caller (line 179)
  4. Performs reconciliation after constraint validation and persistence

This design enables the outer mutation handler to manage locks while reusing validation logic.

backend/tests/unit/core/test_get_kinds_lock.py (1)

74-89: Now let me search the actual codebase to verify the lock generation logic:

Based on my investigation of the Infrahub codebase and documentation, I've gathered sufficient information to provide a rewritten review comment.

Infrahub synchronizes uniqueness_constraints and unique attributes during schema processing, which means when at least one attribute is marked as unique, and human_friendly_id hasn't been defined, human_friendly_id will be automatically created.

From the shell script output showing the lock generation loop, the code iterates for uc in ucs: (line 84) with a fresh uc_attributes_values = [] created per constraint (line 85), indicating each uniqueness constraint generates its own lock entry separately.

The test sets up two uniqueness constraints:

  1. Explicit: height__value (line 82)
  2. Implicit: name (from unique=True on the name attribute)

Yet the current assertion (lines 88-89) expects a single combined hash string rather than two separate lock names.


Test assertion appears to expect incorrect lock format for multiple uniqueness constraints.

The test creates a person with two uniqueness constraints (explicit height__value and implicit name from unique=True). Based on the lock generation logic where each constraint is processed separately in a loop, the assertion should expect a list with two separate lock name strings, not one concatenated string with two hashes.

Verify that the expected assertion should be:

assert get_lock_names_on_object_mutation(person, schema_branch=schema_branch) == [
    "global.object.TestPerson." + _hash(""),
    "global.object.TestPerson." + _hash("John"),
]

Comment on lines +225 to 241
async with InfrahubMultiLock(lock_registry=lock.registry, locks=namespace_lock_names):
async with InfrahubMultiLock(lock_registry=lock.registry, locks=lock_names, metrics=False):
async with db.start_transaction() as dbt:
reconciled_address = await cls._mutate_update_object_and_reconcile(
info=info, data=data, branch=branch, address=address, namespace_id=namespace_id, db=dbt
info=info,
data=data,
branch=branch,
db=dbt,
address=address,
namespace_id=namespace_id,
fields_to_validate=fields_to_validate,
fields=fields,
previous_profile_ids=before_mutate_profile_ids,
lock_names=lock_names,
)
else:
reconciled_address = await cls._mutate_update_object_and_reconcile(
info=info, data=data, branch=branch, address=address, namespace_id=namespace_id, db=dbt
)

result = await cls.mutate_update_to_graphql(db=dbt, info=info, obj=reconciled_address)
result = await cls.mutate_update_to_graphql(db=dbt, info=info, obj=reconciled_address)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify consistent lock ordering across all IPAM mutations to prevent deadlocks.

The nested multi-lock pattern acquires:

  1. Namespace lock (line 225)
  2. Object lock (line 226)
  3. Transaction (line 227)

Ensure this ordering is consistent across:

  • IPAddress create/update/delete
  • IPPrefix create/update/delete

If different mutations acquire these locks in different orders, deadlocks can occur. The current implementation appears consistent, but verify that all code paths follow: namespace → object → transaction.

Run the following script to check lock ordering consistency:


🏁 Script executed:

#!/bin/bash
# Description: Verify consistent lock ordering in all IPAM mutations

# Find all InfrahubMultiLock usage in IPAM mutations
rg -nP -B2 -A8 'InfrahubMultiLock.*namespace' backend/infrahub/graphql/mutations/ipam.py

# Check for any mutations that might acquire locks in different order
rg -nP 'get_lock_names_on_object_mutation.*namespace|namespace.*get_lock_names_on_object_mutation' backend/infrahub/graphql/mutations/ipam.py

Length of output: 1273


🏁 Script executed:

#!/bin/bash
# Find all mutation methods in the IPAM mutations file

# Get all method names that are likely mutations
rg -nP 'async def (mutate_|create_|update_|delete_)' backend/infrahub/graphql/mutations/ipam.py

# For each IPAddress and IPPrefix, check their create, update, delete patterns
echo "=== IPAddress mutations ==="
rg -nP 'class.*IPAddress' backend/infrahub/graphql/mutations/ipam.py -A 50 | head -100

echo "=== IPPrefix mutations ==="
rg -nP 'class.*IPPrefix' backend/infrahub/graphql/mutations/ipam.py -A 50 | head -100

Length of output: 4536


🏁 Script executed:

#!/bin/bash
# Find all lock acquisitions and their ordering in the file
rg -nP 'InfrahubMultiLock|start_transaction' backend/infrahub/graphql/mutations/ipam.py | head -50

Length of output: 1065


🏁 Script executed:

#!/bin/bash
# Look for any alternative lock patterns or deviations
rg -nP -B5 'start_transaction' backend/infrahub/graphql/mutations/ipam.py | grep -E 'InfrahubMultiLock|start_transaction|async def' | head -60

Length of output: 994


🏁 Script executed:

#!/bin/bash
# Check the upsert mutation implementations for lock patterns
rg -nP -B2 -A15 'async def mutate_upsert' backend/infrahub/graphql/mutations/ipam.py | grep -E 'async def mutate_upsert|InfrahubMultiLock|start_transaction'

Length of output: 65


🏁 Script executed:

#!/bin/bash
# Get more context around the create mutation to understand why it only uses object lock
rg -nP -B5 -A20 'async def mutate_create' backend/infrahub/graphql/mutations/ipam.py | head -80

Length of output: 2399


🏁 Script executed:

#!/bin/bash
# Check if there are any comments or documentation explaining the lock strategy
rg -nP 'namespace.*lock|lock.*namespace' backend/infrahub/graphql/mutations/ipam.py -B2 -A2

Length of output: 2439


🏁 Script executed:

#!/bin/bash
# Get full mutate_delete implementation for IPAddress
rg -nP -B2 -A25 'async def mutate_delete' backend/infrahub/graphql/mutations/ipam.py | head -60

Length of output: 2090


🏁 Script executed:

#!/bin/bash
# Get full mutate_upsert implementation
rg -nP -B2 -A30 'async def mutate_upsert' backend/infrahub/graphql/mutations/ipam.py | head -100

Length of output: 2525


🏁 Script executed:

#!/bin/bash
# Check what get_lock_names_on_object_mutation returns to understand the difference
rg -nP 'get_lock_names_on_object_mutation' backend/infrahub/graphql/mutations/ipam.py -B3 -A3

Length of output: 1158


🏁 Script executed:

#!/bin/bash
# Find the parent class InfrahubMutationMixin and its lock patterns
rg -nP 'class InfrahubMutationMixin' backend/infrahub/graphql/mutations/ -A 3

Length of output: 265


🏁 Script executed:

#!/bin/bash
# Search for mutate_delete and mutate_upsert in the parent mutation class
fd -e py 'main.py' backend/infrahub/graphql/mutations/ -x grep -l 'mutate_delete\|mutate_upsert'

Length of output: 42


🏁 Script executed:

#!/bin/bash
# Look for lock patterns in the main mutation file
rg -nP 'InfrahubMultiLock|start_transaction' backend/infrahub/graphql/mutations/main.py | head -30

Length of output: 271


🏁 Script executed:

#!/bin/bash
# Get full context of mutate_delete in main.py
rg -nP -B5 -A25 'async def mutate_delete' backend/infrahub/graphql/mutations/main.py | head -80

Length of output: 1077


🏁 Script executed:

#!/bin/bash
# Get full context of mutate_upsert in main.py
rg -nP -B5 -A25 'async def mutate_upsert' backend/infrahub/graphql/mutations/main.py | head -80

Length of output: 1421


🏁 Script executed:

#!/bin/bash
# Get context around line 251 to see the lock pattern in parent class
rg -nP -B10 -A10 'InfrahubMultiLock.*lock_registry' backend/infrahub/graphql/mutations/main.py | head -50

Length of output: 873


🏁 Script executed:

#!/bin/bash
# Get the FULL mutate_upsert implementation including lock patterns
rg -nP -B5 -A50 'async def mutate_upsert' backend/infrahub/graphql/mutations/main.py | head -100

Length of output: 2423


🏁 Script executed:

#!/bin/bash
# Check if there's a difference in how update and create acquire locks
rg -nP -B3 -A3 'get_lock_names_on_object_mutation' backend/infrahub/graphql/mutations/main.py

Length of output: 1004


🏁 Script executed:

#!/bin/bash
# Find mutate_create in main.py to see if it acquires locks
rg -nP -B5 -A40 'async def mutate_create' backend/infrahub/graphql/mutations/main.py | head -100

Length of output: 3482


🏁 Script executed:

#!/bin/bash
# Check the _call_mutate_update to understand the full flow
rg -nP -B5 -A40 '_call_mutate_update' backend/infrahub/graphql/mutations/main.py | head -120

Length of output: 5089


🏁 Script executed:

#!/bin/bash
# Verify what locks are acquired in the parent update vs IPAM update
rg -nP -B15 'async with InfrahubMultiLock' backend/infrahub/graphql/mutations/main.py | head -60

Length of output: 744


Lock ordering is inconsistent—create/delete bypass IPAM-specific locks used in update.

Verification reveals:

  • mutate_update (lines 225-227, 402-404): namespace lock → object lock → transaction
  • mutate_create (lines 145, 325): namespace lock → transaction (missing object lock)
  • mutate_delete: delegates to parent (no IPAM locks); parent acquires only transaction
  • mutate_upsert: delegates to parent, routing to create or update inconsistently

The update operations correctly acquire both namespace and object locks via get_lock_names_on_object_mutation(). However, create and delete do not follow this pattern, bypassing object-level locks entirely. If concurrent operations mix these paths, especially given cardinality-one relationships noted in IPPrefix locks (lines 293-295), deadlock or race conditions are possible. Align all mutations to use the same two-lock pattern: namespace → object → transaction.

🤖 Prompt for AI Agents
In backend/infrahub/graphql/mutations/ipam.py around lines 145–404,
create/delete/upsert paths do not follow the same lock ordering as update
(namespace → object → transaction), which risks races/deadlocks; change
mutate_create, mutate_delete (and upsert delegations) to compute object lock
names with get_lock_names_on_object_mutation() and wrap operations in the same
nested InfrahubMultiLock sequence used by mutate_update (first namespace locks,
then object locks, then start_transaction) so all mutation flows consistently
acquire namespace then object locks before starting the DB transaction.

@fatih-acar
Copy link
Contributor

Replaced by #7508

@fatih-acar fatih-acar closed this Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants